Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features
نویسندگان
چکیده
This study proposes a cross-domain multi-objective speech assessment model, called MOSA-Net, which can simultaneously estimate the quality, intelligibility, and distortion scores of an input signal. MOSA-Net comprises convolutional neural network bidirectional long short-term memory architecture for representation extraction, multiplicative attention layer fully connected each metric prediction. Additionally, features (spectral time-domain features) latent representations from self-supervised learned (SSL) models are used as inputs to combine rich acoustic information obtain more accurate assessments. Experimental results show that in both seen unseen noise environments, improve linear correlation coefficient (LCC) perceptual evaluation quality (PESQ) prediction, compared Quality-Net, existing single-task model PESQ LCC short-time objective intelligibility (STOI) STOI-Net, STOI Moreover, be pre-trained effectively adapted predicting subjective with limited amount training data. mean opinion score (MOS) predictions, MOS-SSL, strong MOS We further adopt guide enhancement (SE) process derive quality-intelligibility (QI)-aware SE (QIA-SE) approach. QIA-SE outperforms baseline system improved environments over model.
منابع مشابه
Perceptually-based objective measure for non-intrusive speech quality assessment
This paper proposes a new perceptuallybased method for assessing speech quality and evaluates its performance. The method is based on comparing the received speech to an appropriate reference representing the closest match from a preformulated codebook. The codebook holds a number of optimally clustered speech parameter vectors extracted from a large number of various undistorted clean speech r...
متن کاملOutput-Based Objective Measure for Non-Intrusive Speech Quality Evaluation
This paper describes a newly developed output-based method for non-intrusive evaluation of speech quality of voice communication systems, and evaluates its performance. The method, which uses only the output of the system, is based on measuring perceptually motivated objective auditory distances between the voiced parts of the speech signal whose quality to be evaluated to appropriately matchin...
متن کاملNon-intrusive Speech Quality Assessment in Simplified E-Model
The E-model brings a modern approach to the computation of estimated quality, allowing for easy implementation. One of its advantages is that it can be applied in real time. The method is based on a mathematical computation model evaluating transmission path impairments influencing speech signal, especially delays and packet losses. These parameters, common in an IP network, can affect speech q...
متن کاملNon-Intrusive SOM-Based Speech Quality Assessment for Telephony Applications
A non-intrusive method for speech quality assessment in telephony applications is proposed and its performance evaluated. The method involves measuring perception-based objective auditory distances between the voiced parts of the processed (degraded) speech signal to appropriately matching references extracted from a pre-formulated codebook. The codebook is formed by optimally clustering large ...
متن کاملMulti-Objective Deep Reinforcement Learning
We propose Deep Optimistic Linear Support Learning (DOL) to solve highdimensional multi-objective decision problems where the relative importances of the objectives are not known a priori. Using features from the high-dimensional inputs, DOL computes the convex coverage set containing all potential optimal solutions of the convex combinations of the objectives. To our knowledge, this is the fir...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2023
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2022.3205757